Biokleisli: a Digital Library for Biomedical Researchers ? Medline Journal Title Abbreviation Iso Journal Title Abbreviation

نویسندگان

  • S. B. Davidson
  • C. Overton
  • V. Tannen
  • L. Wong
چکیده

Data of interest to biomedical researchers associated with the Human Genome Project (HGP) is stored all over the world in a number of di erent electronic data formats and accessible through a variety of interfaces and retrieval languages. These data sources include conventional relational databases with SQL interfaces, formatted text les on top of which indexing is provided for e cient retrieval (ASN.1-Entrez), and binary les that can be interpreted textually or graphically via special purpose interfaces (ACeDB). Researchers within the HGP want to combine data from these di erent data sources, add value through sophisticated data analysis techniques (such as the biosequence comparison software BLAST and FASTA), and view it using special purpose scienti c visualization tools. However, currently there are no commercial tools for enabling such an integrated digital library, and a fundamental barrier to developing such tools appears to be one of language design and optimization: The data formats and software packages found throughout the HGP contain a number of data types not available in conventional databases, such as lists, variants and arrays; furthermore, these types may be deeply nested. We present in this paper a framework for providing read access to multiple data sources with complex structured data, and illustrate its use in an application called BioKleisli which accesses data sources critical to the HGP. The three primary components of this framework are: (1) a powerful language for querying and transforming complex structured data; (2) an extensible architecture for implementing the query primitives; and (3) optimization techniques that extend many known techniques to these more complex data types.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Research Paper: ALICE: An Algorithm to Extract Abbreviations from MEDLINE

OBJECTIVE To help biomedical researchers recognize dynamically introduced abbreviations in biomedical literature, such as gene and protein names, we have constructed a support system called ALICE (Abbreviation LIfter using Corpus-based Extraction). ALICE aims to extract all types of abbreviations with their expansions from a target paper on the fly. METHODS ALICE extracts an abbreviation and ...

متن کامل

S-RAD: A Simple and Robust Abbreviation Dictionary

The Simple and Robust Abbreviation Dictionary (S-RAD) provides an easy to implement, high performance tool for the construction of a biomedical symbol dictionary. We describe and evaluate the algorithms of the system and apply it to the MEDLINE document set. The resulting dictionary represents a useful tool to researchers and provides programmers with a mechanism to disambiguate abbreviation sy...

متن کامل

ALICE: An Algorithm to Extract Abbreviations from MEDLINE

Methods: ALICE extracts an abbreviation and its expansion from the literature by using heuristic pattern-matching rules. This system consists of three phases and potentially identifies valid 320 abbreviation-expansion patterns as combinations of the rules. Results: It achieved 95% recall and 97% precision on randomly selected titles and abstracts from the MEDLINE database. Conclusion: ALICE ext...

متن کامل

SaRAD: a Simple and Robust Abbreviation Dictionary

MOTIVATION Due to recent interest in the use of textual material to augment traditional experiments it has become necessary to automatically cluster, classify and filter natural language information. RESULTS The Simple and Robust Abbreviation Dictionary (SaRAD) provides an easy to implement, high performance tool for the construction of a biomedical symbol dictionary. The algorithms, applied ...

متن کامل

Research Paper: Creating an Online Dictionary of Abbreviations from MEDLINE

OBJECTIVE The growth of the biomedical literature presents special challenges for both human readers and automatic algorithms. One such challenge derives from the common and uncontrolled use of abbreviations in the literature. Each additional abbreviation increases the effective size of the vocabulary for a field. Therefore, to create an automatically generated and maintained lexicon of abbrevi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996